Different Retrieval Models and Hybrid Term Indexing
نویسنده
چکیده
Retrieval effectiveness depends on both the retrieval model and how terms are extracted and indexed. For Chinese, Japanese and Korea text, there are no spaces to delimit words. Indexing using hybrid terms (i.e. words and bigrams) was not very effective in NTCIR-II open evaluation. In this evaluation, we found that using the 2-Poisson model with hybrid term indexing can be effective in retrieval. With our pseudo-relevance feedback, the performance can be enhanced to a level that is comparable to the best performance in the formal runs. Therefore, we found that hybrid term indexing is promising when the 2-Poisson model is used.
منابع مشابه
Hybrid Term Indexing: an Evaluation
Retrieval effectiveness depends on how terms are extracted and indexed. For Chinese text (and others like Japanese and Korean), there are no space to delimit words. Indexing using hybrid terms (i.e. words and bigrams) were able to achieve the best precision amongst homogenous terms at a lower storage cost than indexing with bigrams. However, this was tested with conjunctive queries, using a sma...
متن کاملImproved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features
Different models retrieve the documents based on different approaches of extracting the underlying content. Different levels of indexing features also offer different functionalities and discriminabilities when retrieving the documents. In this paper, we present results for Chinese spoken document retrieval with hybrid models to integrate the knowledge obtainable from three basic retrieval mode...
متن کاملImproved Chinese Spoken D with Hybrid Modeling and D Feature
Different models retrieve the documents based on different approaches of extracting the underlying content. Different levels of indexing features also offer different functionalities and discriminabilities when retrieving the documents. In this paper, we present results for Chinese spoken document retrieval with hybrid models to integrate the knowledge obtainable from three basic retrieval mode...
متن کاملMultidimensional term indexing for efficient processing of complex queries
The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simp...
متن کاملContent Based Radiographic Images Indexing and Retrieval Using Pattern Orientation Histogram
Introduction: Content Based Image Retrieval (CBIR) is a method of image searching and retrieval in a database. In medical applications, CBIR is a tool used by physicians to compare the previous and current medical images associated with patients pathological conditions. As the volume of pictorial information stored in medical image databases is in progress, efficient image indexing and retri...
متن کامل